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AWARDS ABSTRACT 


This invention provides a new hierarchical approach for supervised neural 
learning of time dependent trajectories. The modular hierarchical methodology leads 
to architectures which are more structured than fully interconnected networks. The 
networks utilize a general feedforward flow of information and sparse recurrent 
connections to achieve dynamic effects. The advantages include the sparsity of units 
and connections, the modular organization. A further advantage is that the learing is 
much more circumscribed learning than in fully interconnected systems. The present 
invention is embodied by a neural network including a plurality of neural modules each 
having a pre-established performance capability wherein each neural module has an 
output outputting present results of the performance capability and an input for 
changing the present results of the performance capability. For pattern recognition 
applications, the performance capability may be an oscillation capability producing a 
repeating wave pattern as the present results. In the preferred embodiment, each of 
the plurality of neural modules includes a pre-established capability portion and a 
performance adjustment portion connected to control the pre-established capability 
portion. 
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BACKGROUND OF THE INVENTION 


Origin of the Invention: 

The invention described herein was made in the perfor- 
mance of work under a NASA contract, and is subject to the 
provisions of Public Law 96-517 (35 USC 202) in which the 
contractor has elected not to retain title. 


Technical Field: 

This invention relates to neural networks and, more par- 
ticularly, to methods and apparatus wherein a modular hi- 
erarchical approach is employed to increase the learning po- 
tential and shorten the learning time for such networks. 


Background Art: 

Artificial neural networks aim to provide complex infor- 
* mation processing, comparable to that of biological systems. 

To reach this goal, versatile methods of learning must be 
available. That is, the neural networks must learn from some 
sort of a learning and/or teaching process. Learning, of 
course, is a fundamental ability of biological systems. In 
the prior art. the most successful approaches to learning 
have been either the back- propagation or gradient descent 
method. Although very powerful on relatively simple prob- 


lems, theoretical analysis and simulations show that these 
approaches break down as soon as sufficiently complex prob- 
lems are considered. A solution applicable to complex prob- 
lems has been eagerly anticipated in the neural network arts. 


The reason for this is depicted in greatly simplified form 
in FIG. 1. The typical prior art neural network computing 
system 10 includes a fully interconnected neural network 12. 
There are a plurality of outputs 14 connected to a learning 
function 16. The synaptic weights and neural gains within 
the network 12 can be changed and adjusted by the learning 
function 16 over the lines 18 and 20, respectively. Learning 
takes place according to the techniques of the particular ap- 
proach applied by adjusting the various synaptic weights and 
neural gains within the neural network 12 until the desired 
output response is achieved as a result of known inputs 22. 
Again, this is a very simplistic representation of a complex 
structure and methodology presented only for the proposition 
that the neural network 12 in the prior art is a fully inter- 
connected network that basically starts from scratch with a 
clean slate in the learning process. 


Learning is a fundamental ability of biological systems. 
Understanding its principles is also key to the design of in- 
telligent circuits, computers and machines of various kinds. 
To this date, the most successful approach to learning from 
an engineering standpoint has been the back- propagation 
approach or gradient descent approach. In this framework, 
in the course of learning from examples, the parameters of 
a learning system, such as a neural network, are adjusted 
incrementally so as to optimize by gradient descent a suit- 
able function measuring the performance of the system at 


- 3 - 


any given time. Although very powerful on relatively simple 
problems, theoretical analysis and simulations show that this 
approach breaks down as soon as sufficiently complex prob- 
lems are considered. Gradient descent learning applied to an 
amorphous learning system is bound to fail. (The present 
invention described below overcomes this fundamental limi- 
tation.) 


An example from the prior art is now described for the 
basic problem of trajectory learning in neural networks. The 
ideas involved, however, extend immediately to more general 
computational problems. 


Consider the problem of synthesizing a neural network 
capable of producing a certain given non-trivial trajectory. 
To fix the ideas, we can imagine that the model neurons in 
20 the network satisfy the usual additive model equations 

^ = -^ + E^/(«;)+C (1) 

at r t j 

25 The learning task is to find the right parameter values, for 
instance for the synaptic weights w rj , the charging time con- 
stants r x and the amplifiers gains, so that the output units 
of the network follow a certain prescribed trajectory u*(t) 
over a given time interval [b)Gi]- For instance, a typical 
~’° benchmark trajectory in the literature is a circle or a figure 
eight, as in FIG. 2. Networks corresponding to Equation 
(1) above have been successfully trained, although through 
lengthy computer runs, on figure eights using a form of gra- 
35 client descent learning for recurrent networks. Consider now 
the problem of learning a more complicated trajectory, such 
as a double figure eight (i.e., a set of four loops joined at 



one point), as in FIG. 3. Although the task appears only 
slightly more complicated, simulations show that a fully in- 
terconnected set of units will not be able to learn this task 
by indiscriminate gradient descent learning on all of its pa- 
rameters. Thus a different approach is needed. 


Biology seems to have overcome the obstacles inherent to 
gradient descent learning through evolution. Learning in bio- 
logical organisms is never started from a tabula rasa. Rather, 
a high degree of structure is already present in the neural cir- 
cuitry of newly born organisms. This structure is genetically 
encoded and the result of evolutionary tinkering over time 
scales several times larger than those of continental drift. 


Little is known of the interaction between the prewired 
structure and the actual learning. One reasonable hypothe- 
sis is that complex tasks are broken up into simpler modules 
and that learning, perhaps in different forms, can operate 
both within and across modules. The modules in turn can 
be organized in a hierarchical way, all the way up to the level 
of nuclei or brain areas. The difficult problem then becomes 
how to find a suitable module decomposition and whether 
there are any principles for doing so (in particular, the solu- 
tions found by biology are probably not unique). One trick 
used by evolution seems to have been the duplication, by er- 
ror, of a module together with the subsequent evolution of 
one of the copies into a new module somehow^ complementary 
of the first one. But this is far from yielding any useful prin- 
ciple and may, at best, be used in genetic type of algorithms, 
where evolutionary tinkering is mimicked in the computer. 



As stated earlier, learning is a fundamental ability of bio- 
logical systems. It would seem, therefore, that understanding 
and applying its principles might also be a key to the design 
of intelligent circuits and computers. In other words, to over- 
come the fundamental limitation of the prior art as discussed 
above, there might well be a solution which could be inspired 
by and based on biological networks. Since a high degree of 
structure is already present in the neural circuitry of newly 
born organisms, perhaps one should employ a hierarchical 
and modular approach whereby a certain degree of structure 
is initially introduced in the learning system at ” birth” . 


Wherefore, it is an object of this invention to provide an 
artificial neural network based on the principles of biological 
neural networks. 


It is another object of this invention to provide a neu- 
ral network employing a hierarchical and modular approach 
whereby a certain degree of structure is initially introduced 
in the learning system. 


Other objects and benefits of this invention will become 
apparent from the description which follows hereinafter when 
read in conjunction with the drawing figures which accom- 
pany it. 


SUMMARY OF THE DISCLOSURE 
The present invention includes a hierarchical and modular 


approach, directly inspired from biological networks, whereby 
a certain degree of structure is introduced in the learning sys- 
tem. The basic organization of the system consists of a hier- 
archy of modules. The lowest levels of the hierarchy serve as 
primitives or basic building blocks for the successive levels. 


The present invention is embodied in a neural network 
including a plurality of neural modules each having a pre- 
established performance capability wherein each neural mod- 
ule has an output outputting a present results of the perfor- 
mance capability and an input for changing the present re- 
sults of the performance capability. For pattern recognition 
applications, the performance capability may be an oscilla- 
tion capability producing a repeating wave pattern as the 
present results. 


In the preferred embodiment, each of the plurality of 
neural modules includess a pre-established capability portion 
having an output therefrom, and a performance adjustment 
portion connected to control the pre-established capability 
portion. 


Further in the preferred embodiment, a first group of the 
plurality of neural modules is on a first hierarchical level; 
and. a second group of the plurality of neural modules is 
on a second hierarchical level. Additionally, the first group 
of the plurality of neural modules controls the second group 
of the plurality of neural modules. For pattern recognitions 
applications and the like, the first group of the plurality of 
neural modules produces a first portion of a desired time de- 
pendent pattern: and. the second group of the plurality of 
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neural modules receives the first portion and forms a second 
portion of the desired time dependent pattern therefrom. 


5 In a three level hierarchy embodiment, a first group of 
the plurality of neural modules is on a first hierarchical level; 
a second group of the plurality of neural modules is on a sec- 
ond hierarchical level; a third group of the plurality of neural 
10 modules is on a third hierarchical level; the first group of the 
plurality of neural modules controls the second group of the 
plurality of neural modules; the second group of the plurality 
of neural modules controls the third group of the plurality 
of neural modules; the first group of the plurality of neural 
15 modules produces a first portion of a desired time dependent 
pattern; the second group of the plurality of neural mod- 
ules receives the first portion and forms a second portion of 
the desired time dependent pattern therefrom; and, the third 
group of the plurality of neural modules receives the second 
portion and forms the desired time dependent pattern there- 
from. 


25 

BRIEF DESCRIPTION OF THE DRAWINGS 

FIG. 1 is a greatly simplified drawing of a prior art neural 
network system. 

30 


FIG. 2 is a drawing of a single figure eight pattern. 


35 


FIG. 3 is a drawing of a double figure eight pattern. 
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FIG. 4 is a block diagram of a neural module as employed 
in the present invention. 


FIG. 5 is a greatly simplified drawing in the manner of 
FIG. 1 showing how in the present invention the neural net- 
work is composed of neural modules as in FIG. 4. 


FIG. 6 is a functional block diagram of a hierarchical 
structure of neural modules as may be employed in the present 
invention. 


15 

FIG. 7 is a functional block diagram as in FIG. 6 wherein 
the neural modules include adjustable oscillators and con- 
trol modules as may be employed with the present invention 
20 when generating time dependent patterns. 


FIG. 8 is the structure of FIG. 7 with the outputs of each 
of the modules shown when in the process of generating the 
25 double figure eight pattern of FIG. 3. 


FIG. 9 depicts a target oval against the actual oval pro- 
duced by Layer 1 modules of FIG. 8 in laboratory tests 
thereof. 


FIG. 10 depicts a target single figure eight against the 
35 actual figure eight produced by Layer 2 modules of FIG. 8 in 
laboratory tests thereof. 
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DETAILED DESCRIPTION OF THE PREFERRED 

EMBODIMENT 

5 

One common use of neural network systems is the gen- 
eration of desired patterns which can then be used for such 
applications as pattern recognition, and the like. Since the 
present invention as tested to date has been for such appli- 
10 cations, its structure for pattern generation will be employed 
as the example hereinafter and in the accompanying draw- 
ings. It is to be understood, however, that the invention is 
not limited to this single use and variations within the scope 
and spirit of this disclosure are to be considered as part of 
15 the invention being disclosed and to be covered hereby and 
the claims appended hereto. 

20 The basic organization of the solution described herein 
consists of a hierarchy of modules, where each module can 
be viewed essentially as an oscillator. The modules, in turn, 
are organized in a hierarchical way. All the modules within 
one level of the hierarchy control the output of the modules 
25 located in the previous layer. This solution leads to archi- 
tectures which are more structured than fully interconnected 
networks, with a general feedforward flow of information and 
sparse recurrent connections to achieve dynamic effects. The 
sparsity of the connections as well as the modular organiza- 
tion makes the hardware implementation of the methodology 
very easy and attractive. The approach presented here has 
been applied to a simple example of trajectory learning of 
a semi figure eight. The principles involved extend immedi- 
35 ately to more general computational problems. 
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A single module 24 of a neural network according to the 
present invention is depicted in FIG. 4. Unlike the fully in- 
terconnected structure of prior art neural networks which 
has no inherent capability, the module 24 comprises a pre- 
5 established capability 26 and a performance adjustment 28 
of that capability. There is an output line 30 outputting the 
instantaneous results of the capability 26 and in input line 32 
which allows the performance adjustment 28 to be modified 
10 by a previous layer module. As we will see shortly, in our 
specific example the capability 26 is an oscillator while the 
adjustment 28 is an ability to adjust the parameters of the 
oscillator. 


15 

A system 10’ according to the present invention is de- 
picted in FIG. 5 wherein the neural network 12 comprises a 
plurality of neural modules 24 which can be connected in a 
hierarchical structures as necessary. A three level structure of 
modules 24 is depicted in FIG. 6. The same three level struc- 
ture designating the modules 24 as adjustable oscillators is 
shown in FIG. 7. The outputs of the modules 24 when form- 
ing a double figure eight as the ultimate output according to 
25 the example now to be described in detail is shown in FIG. 
8 . 


The inventors herein have taken inspiration from the bio 
30 logical analogy discussed previously herein to tackle the piob- 
lem of creating specific complex trajectories in a neural net- 
work. Although it is difficult at this stage to keep a close 
analogy with biology, it may be useful to think of the prob- 
35 lem of central pattern generation or motor control in natural 
organisms. In order to construct a neural network capable 
of producing a double figure eight, we introduce a certain 



degree of organization in the system prior to any learning. 
The basic organization of the system consists of a hierarchy 
of modules. In this particular example, each module 24 can 
be viewed essentially as an oscillator. The modules, in turn, 
are organized in a hierarchical way as described above. For 
the time being, all the modules 24 within one level of the 
hierarchy control the output of the modules 24 located in the 
previous layer. 


At the bottom of the hierarchy, in the first level, there is 
a family of simple and possibly independent modules, each 
one corresponding to a circuit with a small number of units 
capable of producing some elementary trajectory, such as a 
sinusoidal oscillation. In the case of the additive model, these 
could be simple oscillator rings with two or three neurons, an 
odd number of inhibitory connections and sufficiently high 
gains. Thus, in one example, the first level of the hierarchy 
could contain four oscillator rings, one for each loop of the 
target trajectory, as depicted in FIG. 8. The parameters in 
each one of these four modules 24 can be adjusted, by gra- 
dient descent or random descent or some other optimization 
procedure, in order to match each one of the loops in the 
target trajectory. 


The second level of the pyramid preferably contains two 
control modules. Each one of these modules 24 controls a dis- 
tinct pair of oscillator networks from the first, level, so that 
each control network in the second level ends up producing 
a simple figure eight (as shown in FIG. 8). Again, the con- 
trol networks in level two can be oscillator rings and their 
parameters can be adjusted. In particular, after the learn- 
ing process is completed, they should be operating in their 
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high-gain regimes and have a period equal to the sum of the 
periods of the circuits each one controls. 


5 Finally, the third layer consists of another oscillatory and 
adjustable module 24 which controls the two modules 24 in 
the second level so as to produce a double figure eight. The 
third layer module 24 must also end up operating in its high- 
gain regime with a period equal to four times the period of 
the oscillators in the first layer. In general, the final output 
trajectory is also a limit cycle because it is obtained by su- 
perimposition of limit cycles in the various modules. If the 
various oscillators relax to their limit cycles independently of 
15 one another, it is preferable to provide for adjustable delays 
between the various modules 24 in order to get the proper 
harmony among the various phases. In this way, a sparse 
network with 20 units or so can be constructed which can 
successfully execute a double figure eight. The importance 
20 of the effects of delays and adjustable delays in these architec- 
tures and their ubiquitous presence in natural neural systems 
has also lead us to conduct an analytical study of the effect of 
delays on neural dynamics (especially oscillatory properties) 
25 and learning. The main result of our study is that delays 
tend to increase the period of oscillations and broaden the 
spectrum of possible frequencies in a quantifiable way. A re- 
current back-propagation learning algorithm can be derived 
for adjustable delays. 

30 


There are actually different possible neural network real- 
izations depending on how the action of the control modules 
24 is implemented. For instance, if the control units are 
gating the connections between corresponding layers, this 
amounts to using higher order units in the lie tv oik. The 
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number of layers in the network then becomes a function of 
the order of the units one is willing to use. Alternatively, 
one could assume the existence of a fast weight dynamics on 
certain connections governed by a corresponding set of dif- 
5 ferential equations. 


It is clear that this approach which combines a modular 
10 hierarchical architecture together with some simple form of 
learning can be extended to general trajectories. At the very 
least, one could always use Fourier analysis to decompose a 
target trajectory into a superimposition of sinusoidal oscilla- 
tions of different frequencies and use, in the first level of the 
15 hierarchy, a corresponding large bank of oscillators networks 
(although this decomposition may not be the most economi- 
cal). One could also use damped oscillators to perform some 
sort of wavelet decomposition. Although we believe that os- 
cillators with limit cycles present several attractive proper- 
'"’ J ties (such as stability, short transients, biological relevance, 
for example), one can conceivably use completely different 
circuits as building blocks in each module. Another obser- 
vation is that the problem of synthesizing a network capable 
25 of certain given trajectories is more general than what would 
seem at first sight. In fact, any computation can be viewed 
as some sort of trajectory in the state space of a computing 
device, whether digital or analog. 


The modular hierarchical approach of the present inven- 
tion leads to architectures which are more structured than 
fully interconnected networks, with a general feedforward 
flow of information and sparse recurrent connections to achieve 
dvnamical effects. The sparsity of units and connections are 
attractive features for hardware design: and so is also the 
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modular organization and the fact that learning is much more 
circumscribed than in fully interconnected systems. In these 
architectures, some form of learning remains essential, for 
instance to fine tune each one of the modules. This, in it- 
5 self, is a much easier task than the one a fully interconnected 
and random network would have been faced with. It can be 
solved by gradient or random descent or other methods. 


10 

Example of Numerical Simulations: 

The new learning paradigm, presented in the preceding 
section, has been applied to the problem of learning a figure 
eight trajectory. Results referring to this problem obtained 
15 using prior techniques can be found in the liteiatuie. 


We assume that the desired trajectory of a semi-figure 
2o eight is composed of two circles and given by: 

D\ = Ci [xio + cos(t )] + (1 - Ci)[yio - cos(t )] (2a) 

Do = C\ [x- 2 o + sin(t)} + (1 — Ci)[y-20 + sin(t)} (26) 
in which C\ is a square wave with a period of 47T , given bv 

25 

the following equation; 

Ci — sign[sin(t/ 2 )] (3) 

and aqo, Vio-, Uio are the coordinates of the centei of the 
30 left and right circles respectively. Plotting A vs. Do will 
produce the desired semi-figure eight. 


The basic module of the hierarchical approach foi this 
trajectory is a simple oscillatory ring network with foui neu- 
rons. The activation dynamics of each unit in the module is 



given by: 


du t 

dt 


U i . jr 

1 - Wi~ i Vi- 1 

n 




where Vo = V\ and V t is the output of neuron i given by; 


Vi = tanh( ji U{) 



An odd number of inhibitory connections is required for sta- 
ble oscillations. At this stage for simplicity, we assume that 
Wj = w for i = 1,3,4, 1^2 — ~ w and T i — r ili — 7 
for i — 1, • • • ,4. The module is trained to produce a circle 
through a sinusoidal waive with period of '2ir. The initial 
value of the network parameters, i.e., re, r and 7 are set to 
one at the beginning of the learning procedure. To update 
the network parameters, a gradient descent algorithm based 
upon the forward propagation of the error is used. After the 
training, the network parameters converge to the following 
values, w = 1.025, r = 0.972 and 7 = 1.526. With these 
values, after a brief transition period, the module converges 
to a limit cycle where each unit has a quasi-sinusoidal acti- 
vation. The phase shift between two consecutive neurons is 
about 7t/4. Therefore, plotting the activity of neuron 1 and 3 
in the module against each other will produce a circle which 
is close to the desire one as illustrated in FIG. 9. 


At the second level of the hierarchy is the control mod- 
ule. This module is also chosen to be a simple oscillatory 
ring network with four neurons. This network is operating in 
the high gain regime and its period is twice that of the basic 
modules, i.e., 47 t. The network parameters at the beginning 
of the learning are set to w = 0.9, 7 = 10, and r = 2.58. 


- 16 - 


The overall network has two output at any time, Z\ and 
Z 2 . Their value is given by: 

Z x = 0.5{[1 + VC(1)] [z 10 +V'iVl(l)] + [l-V'C(l)] [y 10 +ViVl(3)]} 

( 6 a) 


Z 2 = 0.5{[1+VC(1)] [*20+V"iV2(l)] + [l-VC(l)] [2/20 + V’AT2(3)]} 

( 66 ) 

10 

in which VNl(i) and VN2(i) are the output of i th neuron 
in the first and second modules in the first level of the hi 
erarchy, respectively, where VC{ 1 ) is the output of the first 
15 neuron in the control module. FIG. 9 shows the semi-figure 
eight obtained be plotting Z\ vs. Z 2 . 


The convergence time of different modules to their limit 
cycle may vary. Therefore, it is essential to have a synchro- 
nization mechanism that aliens the activity of diffeient units 
at various modules and levels. One such mechanism that has 
been adapted in this example is based upon time delays. The 
value of these delays is adjusted by using gradient descent ap- 
proach such that the network outputs are in harmony with 
the desired output. 

In summary, the invention provides a new hieiaichical 
approach for supervised neural learning of time dependent 
trajectories. The modular hierarchical methodology leads to 
architectures which are more structured than fully intei con- 
nected networks, with a general feedforward flow of informa- 


35 
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tion and sparse recurrent connections to achieve dynamical 
effects. The sparsity of the connections as well as the modu- 
lar organization makes the hardware implementation of the 
5 methodology very easy and attractive. This approach has 
been applied to an example of trajectory learning of a semi- 
figure eight. 

While the invention has been described in detail by spe- 
cific reference to preferred embodiments, it is understood that 
variations and modifications thereof may be made without 
departing from the true scope of the invention. 


20 
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A NEURAL NETWORK WITH MODULAR 
HIERARCHICAL LEARNING 

ABSTRACT OF THE INVENTION 

This invention provides a new hierarchical approach for su- 
pervised neural learning of time dependent trajectories. The 
modular hierarchical methodology leads to architectures which 
are more structured than fully interconnected networks. The 
networks utilize a general feedforward flow of information 
and sparse recurrent connections to achieve dynamic effects. 
The advantages include the sparsity of units and connections, 
the modular organization. A further advantage is that the 
learing is much more circumscribed learning than in fully 
interconnected systems. The present invention is embodied 
by a neural network including a plurality of neural modules 
each having a pre-established performance capability wherein 
each neural module has an output outputting present results 
of the performance capability and an input for changing the 
present results of the performance capability. For pattern 
recognition applications, the performance capability may be 
an oscillation capability producing a repeating wave pattern 
as the present results. In the preferred embodiment, each of 
the plurality of neural modules includes a pre-established ca- 
pability portion and a performance adjustment portion con- 
nected to control the pre-established capability portion. 
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